Skip to content

feat: add Sandbox abstraction for agent code execution environments#1968

Draft
agent-of-mkmeral wants to merge 1 commit intostrands-agents:mainfrom
mkmeral:feat/sandbox-abstraction
Draft

feat: add Sandbox abstraction for agent code execution environments#1968
agent-of-mkmeral wants to merge 1 commit intostrands-agents:mainfrom
mkmeral:feat/sandbox-abstraction

Conversation

@agent-of-mkmeral
Copy link

Summary

Add the Sandbox interface that decouples tool logic from where code runs. Tools that need to execute code or access a filesystem receive a Sandbox instead of managing their own execution, enabling portability across local, Docker, and cloud environments.

Related: Design Doc PR #681

What Changed

New Files (4 source + 5 test)

File Lines Purpose
src/strands/sandbox/__init__.py 22 Module exports
src/strands/sandbox/base.py 263 Sandbox ABC with streaming AsyncGenerator interface, ExecutionResult, convenience methods (read_file, write_file, list_files, execute_code), auto-start lifecycle
src/strands/sandbox/local.py 161 LocalSandbox — native file I/O overrides, asyncio subprocess execution
src/strands/sandbox/docker.py 354 DockerSandbox — container lifecycle, stdin piping for safe file writes, docker exec streaming
tests/strands/sandbox/test_base.py 239 22 tests for ABC, ExecutionResult, convenience methods, lifecycle, edge cases
tests/strands/sandbox/test_local.py 201 17 tests for real command execution, file I/O, streaming, timeouts, code execution
tests/strands/sandbox/test_docker.py 314 18 tests for Docker lifecycle, container management, file ops (all mocked)
tests/strands/sandbox/test_agent_sandbox.py 49 6 tests for Agent integration

Modified Files (2)

File Change
src/strands/__init__.py Added exports for Sandbox, LocalSandbox, DockerSandbox, ExecutionResult
src/strands/agent/agent.py Added sandbox parameter to Agent.__init__, defaults to LocalSandbox()

Design Decisions

✅ IN core SDK

  • Sandbox ABC — the interface contract
  • LocalSandbox — zero-dependency host execution (default)
  • DockerSandbox — containerized execution
  • ExecutionResult dataclass
  • Agent integration (sandbox parameter)

❌ NOT in core SDK (for now)

  • AgentCoreSandbox — External dependency on bedrock-agentcore package. Should live in a separate package
  • Sandbox tools (run_command, python_tool, programmatic_tool_caller) — These are tools, not core abstractions. They belong in strands-agents/tools
  • CodeActPlugin — P2 in the design doc, not yet ready

Rationale: The core SDK should provide the minimal, zero-external-dependency abstraction. Tools and cloud-specific sandbox implementations have their own dependency graphs and release cycles.

Key Design Details

Streaming Interface

async for chunk in sandbox.execute("echo hello"):
    if isinstance(chunk, str):
        print(chunk, end="")  # stream output as it arrives
    elif isinstance(chunk, ExecutionResult):
        print(f"Exit: {chunk.exit_code}")  # final result

Agent Integration

from strands import Agent
from strands.sandbox import DockerSandbox

agent = Agent(sandbox=DockerSandbox(image="python:3.12-slim"))
# Tools access via: tool_context.agent.sandbox

Security

  • Heredoc injection → randomized secrets.token_hex(8) delimiter
  • Path injectionshlex.quote() for all paths
  • Shell escapingshlex.quote() for code arguments
  • DockerSandbox → stdin piping instead of heredoc for file writes

Tests

  • 76 new tests, all passing
  • 1686 existing tests still passing, 0 regressions

What's Left (Outside This PR)

  1. AgentCoreSandbox — needs its own package with bedrock-agentcore dependency
  2. Sandbox tools — should go to strands-agents/tools
  3. CodeActPlugin — P2 per design doc
  4. Docker integration tests — need Docker in CI

cc @mkmeral

Add the Sandbox interface that decouples tool logic from where code runs.
Tools that need to execute code or access a filesystem receive a Sandbox
instead of managing their own execution, enabling portability across
local, Docker, and cloud environments.

Core components:
- Sandbox ABC with streaming AsyncGenerator interface (base.py)
- LocalSandbox for host-process execution via asyncio subprocesses (local.py)
- DockerSandbox for containerized execution via docker exec (docker.py)
- Agent integration: sandbox parameter on Agent.__init__, defaults to LocalSandbox

Key design decisions:
- Only core abstractions in SDK; AgentCoreSandbox and sandbox tools belong
  in separate packages (external dependencies, different release cycles)
- Streaming output via AsyncGenerator[str | ExecutionResult] - yields lines
  as they arrive, ExecutionResult as the final yield
- Security: randomized heredoc delimiters, shlex.quote for all paths,
  stdin piping in DockerSandbox to prevent injection
- Auto-start lifecycle: sandbox starts on first execute() call
- Zero external dependencies for core sandbox package

Tests: 76 new tests (base: 22, local: 17, docker: 18, agent: 6, + shared)
All 76 sandbox tests passing, 1686 existing tests still passing.
Copy link
Author

@agent-of-mkmeral agent-of-mkmeral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Testing Result: FAIL — 3 bugs found (1 security, 1 crash, 1 code correctness)

Scope: Full adversarial testing of Sandbox abstraction (base.py, local.py, docker.py, agent integration). Tested shell injection, path traversal, heredoc injection, concurrency, lifecycle edge cases, timeout cleanup, type safety, and contract verification.
Tests written: 67 (61 adversarial + 6 confirmation)
Tests passing: 63
Tests failing (findings): 4 (3 genuine bugs, 1 test infrastructure)

Findings Summary
# Category Severity Description
1 Security Issue Critical execute_code() language parameter injection — unsanitized
2 Bug High Long output lines (>64KB) crash readline() with ValueError
3 Bug Medium Non-existent working directory raises unhandled FileNotFoundError
4 Code Smell Low proc.returncode or 0 masks None as 0 (false success)

Finding 1: execute_code() language parameter injection

Category: Security Issue
Severity: Critical

Reproduction:

sandbox = LocalSandbox(working_dir="/tmp")
marker = "/tmp/injection_marker.txt"
malicious_language = f"touch {marker} #"
await sandbox._execute_code_to_result("print('safe')", language=malicious_language)
assert os.path.exists(marker)  # True — injection succeeded!

Observed behavior: The language parameter is interpolated directly into the shell command without sanitization: f"{language} -c {shlex.quote(code)}". Code is properly shlex.quote()'d, but language is NOT. Any caller passing untrusted input as language enables arbitrary command execution.

Expected behavior: language should be validated against an allowlist (e.g., python, ruby, node) or at minimum shlex.quote()'d.

Fix: Either shlex.quote(language) or validate against an allowlist:

# Option A: Quote it
async for chunk in self.execute(f"{shlex.quote(language)} -c {shlex.quote(code)}", timeout=timeout):
    yield chunk

# Option B: Allowlist (stricter)
ALLOWED_LANGUAGES = {"python", "python3", "ruby", "node", "bash", "sh"}
if language not in ALLOWED_LANGUAGES:
    raise ValueError(f"Unsupported language: {language}")

Finding 2: Long output lines crash readline()

Category: Bug
Severity: High

Reproduction:

sandbox = LocalSandbox(working_dir="/tmp")
await sandbox._execute_to_result("echo " + "A" * 100000)
# Raises: ValueError: Separator is found, but chunk is longer than limit

Observed behavior: asyncio.StreamReader.readline() has a default buffer limit of 64KB. When a command outputs a single line exceeding this, readline() raises ValueError('Separator is found, but chunk is longer than limit'). This propagates as an unhandled exception.

Expected behavior: The sandbox should handle long output lines gracefully. Either increase the buffer limit when creating the subprocess, or catch ValueError/LimitOverrunError in _read_stream.

Fix: Set a larger limit when creating the subprocess:

proc = await asyncio.create_subprocess_shell(
    command,
    cwd=self.working_dir,
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.PIPE,
    limit=1024 * 1024,  # 1MB line limit
)

Note: asyncio.create_subprocess_shell accepts a limit parameter that's passed to the StreamReader.


Finding 3: Non-existent working directory crashes

Category: Bug
Severity: Medium

Reproduction:

sandbox = LocalSandbox(working_dir="/nonexistent/path")
await sandbox._execute_to_result("echo hello")
# Raises: FileNotFoundError: [Errno 2] No such file or directory: '/nonexistent/path'

Observed behavior: asyncio.create_subprocess_shell(cwd="/nonexistent") raises FileNotFoundError which propagates unhandled.

Expected behavior: Either validate the working directory in __init__/start(), or catch the exception in execute() and return an ExecutionResult with a non-zero exit code and descriptive stderr.


Finding 4: proc.returncode or 0 masks None as 0

Category: Code Smell
Severity: Low

Location: local.py:115, docker.py:131, docker.py:262

Issue: The pattern proc.returncode or 0 uses Python's falsy semantics. If returncode is None (process not finished), it maps to 0 (success). The correct pattern is 0 if proc.returncode is None else proc.returncode.

Risk: Low in practice (after proc.wait(), returncode should never be None), but the code is semantically incorrect and fragile.


Other observations (non-findings)

Heredoc injection: Base class write_file uses randomized STRANDS_EOF_{hex} delimiter — resistant to injection. LocalSandbox overrides with native file I/O (even better).

DockerSandbox stdin piping: write_file uses stdin pipe instead of heredoc — immune to shell metacharacter injection.

shlex.quote() on file paths: All path arguments are properly quoted.

Concurrent execution: Multiple concurrent execute() calls work correctly.

Context manager cleanup: Exception during async with properly calls stop().

Docker container cleanup: stop() handles rm failures gracefully.

Timeout process killing: Local and Docker sandboxes properly kill processes on timeout.

⚠️ Path traversal (LocalSandbox): write_file("../escape.txt") writes outside working directory. This is by design for LocalSandbox (host execution), but should be clearly documented as a security boundary. DockerSandbox containment handles this via container isolation.

Artifacts: All adversarial tests provided in the review above.


🤖 AI agent response. Strands Agents. Feedback welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant